Introduction: Despite the availability of effective treatments, the majority of mantle cell lymphoma (MCL) patients eventually relapse. Large-scale plasma proteomics analysis has shown promise for identifying novel prognostic biomarkers, allowing for early clinical interventions in various cancers. In MCL, clinical biomarkers of poor prognosis such as a high MCL international prognostic index (MIPI) are used to differentiate between high- and low-risk patients, however biomarkers to accurately predict high risk of early disease progression are still lacking. In this study, we aimed to evaluate rule-based machine learning (ML) to identify a blood protein profile of soluble biomarkers that could predict progression of disease within 24 months (POD24+) in MCL.

Methods: The main eligibility criteria were a diagnosis of MCL and the availability of blood samples in the U-CAN biobank. Plasma samples from 90 newly diagnosed (years: 2010-2020) MCL patients treated with first-line chemoimmunotherapy and/or Bruton tyrosine kinase inhibitors were processed for proteomics analysis using the Olink Explore panel targeting 1460 proteins simultaneously. POD24+ and overall survival were used as outcome variables. The Shapiro-Wilk test was performed, followed by t-tests and Wilcoxon rank-sum tests to identify differentially expressed proteins between POD24+ and POD24- patient groups. Initially, the dataset was split into training (70%) and test (30%) sets. To reduce redundancy, a correlation filter was applied to exclude proteins with a correlation coefficient greater than 0.75. Random Forest (RF) with recursive feature elimination was used for dimension reduction and predictive performance assessment over 10 cross-validations repeated 10 times. The R.ROSETTA rule-based ML framework was applied to the best predictors set identified to generate predictive rules for patient with high risk of early disease progression. The results were visualized using the VisuNet tool, depicting the number of rules each protein correlated to, the protein expression pattern and the relationships between the proteins. In addition, we also evaluated each of the identified proteins within the network individually by expression pattern, impact on overall survival and MIPI-scoring.

Results: 83 patients (nPOD24+ = 24, nPOD24- = 59) with a median age of 71 (range: 49-86) were included in the study, of which 22 were female (27%) and 61 were male (73%). Patients had a median follow-up time of four years (range: 0.1-17.3). 35 patients (42%) were categorized as MIPI high-risk. Differential expression analysis of the investigated blood proteins yielded 314 significantly deregulated proteins between patients with POD24+ outcome vs POD24- outcome. After correlation filtering and RF modeling, a small pool of 67 proteins was selected. Here, our rule-based ML approach identified a network of four correlated protein nodes, for a total of 21 proteins with the strongest set of rules, revealing co-predictive mechanisms superior to currently established clinical parameters which could be used to predict MCL patients that were POD24+ with 88% accuracy in the test cohort, while MIPI only had an accuracy of 63% to predict POD24+. Of these proteins, 5 were underexpressed (UE) and 11 were overexpressed (OE) in MIPI category 3 versus MIPI category 1. Additionally, 13 of these proteins was associated with poor overall survival. In the core of our protein expression network, LGALS4 (OE), KIT (UE) and PRSS8 (OE) were involved in the highest number of significant rules within the POD24+ prediction model. In addition, LGALS4, IL24, FGFR2, PRSS8, GAS6 and CDH2 had the highest decision coverage, meaning the percentage of patient's contribution to the rule. Furthermore, LGALS4 (OE), CDH2 (OE) and FGFR2 (OE) were the strongest contributors to the predictive power of the model. Using the top seven rules, the protein expression network could also correctly predict patients with overall survival shorter than the cohort median with 90% accuracy.

Conclusions: Taken together, our rules-based ML approach generated a protein expression profile that can accurately predict patients with a high risk of POD24 and a poor survival outcome individually. Following validation in an independent patient cohort, this protein profile could be used in a clinical setting to identify patients at high risk of early disease progression to improve treatment decision-making.

Disclosures

Molin:Roche: Honoraria. Hellström:Novartis: Research Funding; Abbvie: Honoraria; Incyte: Honoraria. Glimelius:Takeda: Honoraria, Other: Research Grant/Funding; Janssen: Speakers Bureau; AstraZeneca: Consultancy.

This content is only available as a PDF.
Sign in via your Institution